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Abstract. We present a categorical denotational semantics for a database map- 
ping, based on views, in the most general framework of a database integration/ex change. 
Developed database category DB , for databases (objects) and view-based map- 
pings (morphisms) between them, is different from Set category: the morphisms 
(based on a set of complex query computations) are not functions, while the ob- 
jects are database instances (sets of relations). The logic based schema mappings 
between databases, usually written in a highly expressive logical language (ex. 
LAV, GAV, GLAV mappings, or tuple generating dependency) may be functori- 
ally translated into this "computation" category DB. A new approach is adopted, 
based on the behavioral point of view for databases, and behavioral equivalences 
for databases and their mappings are established. By introduction of view-based 
observations for databases, which are computations without side-effects, we de- 
fine a fundamental (Universal algebra) monad with a power-view endofunctor 
T. The resulting 2-category DB is symmetric, so that any mapping can be rep- 
resented as an object (database instance) as well, where a higher-level mapping 
between mappings is a 2-cell morphism. Database category DB has the follow- 
ing properties: it is equal to its dual, complete and cocomplete. Special attention 
is devoted to practical examples: a query definition, a query rewriting in GAV 
Database-integration environment, and the fixpoint solution of a canonical data- 
integration model. 



1 Introduction 

Most work in the data integration/exchange and P2P framework is based on a logical 
point of view (particularly for the integrity constraints, in order to define the right mod- 
els for certain answers) in a 'local' mode (source-to-target database), where a general 
'global' problem of a composition of complex partial mappings that involves a number 
of databases has not been given the correct attention. Today, this 'global' approach can- 
not be avoided because of the necessity of P2P open-ended networks of heterogenous 
databases. The aim of this work is a definition of category DB for database mappings 
more suitable than a Set category: The databases are more complex structures w.rt. 
sets, and the mappings between them are too complex to be represented by a single 
(complete) function. Why do we need an enriched categorical semantic domain such as 
this for databases? We will try to give a simple answer to this question: 

This work is an attempt to give a correct solution for a general problem of com- 
plex database-mappings and for high level algebra operators for databases (merging, 



matching, etc.), preserving the traditional common practice logical language for schema 
database mapping definitions. 

The query-rewriting algorithms are not integral parts of a database theory (used 
to define a database schema with integrity constraints); they are programs and we need 
an enriched context that is able to formally express these programs trough mappings 
between databases as well. 

Let us consider, for example, P2P systems or mappings in a complex Dataware- 
house: formally, we would like to make a synthetic graphic representations of database 
mappings and queries and to develop a graphic tool for a meta-mapping description of 
complex (and partial) mappings in various contexts, with a formal mathematical back- 
ground. 

Only a few works considered this general problem lll]2]3]41 . One of them, which uses a 
category theory i2J, is too restrictive; their institutions can be applied only for inclusion 
mappings between databases. 

There is a lot of work for sketch-based denotational semantics for databases 151617181 . 
But all of them use, as objects of a sketch category, the elements of an ER-scheme of 
a database (relations, attributes, etc..) and not the whole database as a single object, 
which is what we need in a framework of inter-databases mappings. It was shown in ||9l 
that if we want to progress to more expressive sketches w.rt. the original Ehresmann's 
sketches for diagrams with hmits and coproducts, by eliminating non-database objects 
as, for example, cartesian products of attributes or powers et objects, we need more ex- 
pressive arrows for sketch categories (diagram predicates in ||9] that are analog to the 
approach of Makkai in fTOl). Obviously, when we progress to a more abstract vision 
where objects are the (whole) databases, following the approach of Makkai, in this new 
basic category DB for databases, where objects are just the database instances (each 
object is a set of relations that compose this database instance), we will obtain much 
more complex arrows, as we will see. Such arrows are not simple functions, as in the 
case of base Set category, but complex trees (operads) of view-based mappings. In this 
way, while Ehresmann's approach prefers to deal with few a fixed diagram properties 
(commutativity, (co)limitness), we enjoy the possibility of setting full relational-algebra 
signature of diagram properties. 

This work is an attempt to give a correct solution for this problem while preserving the 
traditional common practice logical language for the schema database mapping defini- 
tions. Different properties of this DB category are considered in a number of previously 
published papers HI II12II3I14II5I as well. 

This paper follows the following plan: In Section 2 we present an Abstract Object Type 
based on view-based observations. In Section 3 we develop a formal definition for a 
Database category DB, its power- view endofunctor, and its duality property. In Section 
4 we formulate the two equivalence relations for databases (objects in DB category): 
a strong and a weak observation equivalences. Finally, in Section 5 we present an ap- 
plication of this theory to the data integration/exchange systems, with an example for 
a query-rewriting in data integration system, and we define a fixpoint operator for an 
infinite canonical solution in data integration/exchange systems. 
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1.1 Technical Preliminaries 

The database mappings, for a given logical language, are defined usually at a schema 
level, as follows: 

- A database schema is a pair A = {S^, Sn) where: Sh is a countable set of rela- 
tion symbols r E R with finite arity, disjoint from a countable infinite set att of 
attributes (for any x e att a domain of x is a nonempty subset dom{x) of a count- 
able set of individual symbols dom, disjoint from att ), such that for any r G R, 
the sort of i? is a finite sequence of elements of att. Sn denotes a set of closed for- 
mulas called integrity constraints, of the sorted first-order language with sorts att, 
constant symbols dom, relational symbols R, and no function symbols. 

A finite database schema is composed by a finite set Sh, so that the set of all at- 
tributes of such a database is finite. 

- An instance of a database A is given by A = {A, Ia), where 1^ is an interpretation 
function that maps each schema element of S^^ (n-ary predicate) into an n-ary 
relation Ui £ A (called also "element of A" ). Thus, a relational instance-database 
A is a set of n-ary relations. 

- We consider a rule-based conjunctive query over a database schema A as an ex- 
pression q(x) i — Rn{un), where n > 0, Ri are the relation names (at 
least one) in A or the built-in predicates (ex. <, etc.), <? is a relation name not 
in A, Ui are free tuples (i.e., may use either variables or constants). Recall that if 
V = {vi, ..jVm) then R{v) is a shorthand for R{vi, ■.,v„i). Finally, each variable 
occurring in x must also occur at least once in ui, u„. Rule-based conjunctive 
queries (called rules) are composed by: a subexpression i?„(u„), that 
is the body, and g(x) that is the head of this rule. If one can find values for the 
variables of the rule, such that the body holds (i.e. is logically satisfied), then one 
may deduce the head-fact. This concept is captured by a notion of "valuation". In 
the rest of this paper a deduced head-fact will be called "a resulting view of a query 
(/(x) defined over a database A". Recall that the conjunctive queries are monotonic 
and satisfiable. The Yes /No conjunctive queries are the rules with an empty head. 

- We consider that a mapping between two databases A and B is expressed by an 
union of "conjunctive queries with the same head". Such mappings are called 
"view-based mappings". Consequently we consider a view of an instance-database 
A an n-ary relation (set of tuples) obtained by a "select-project-join + union" 
(SPJRU) query q{ii) (it is a term of SPJRU algebra) over A: if this query is a fi- 
nite term of this algebra than it is called a "finitary view" (a finitary view can have 
also an infinite number of tuples). 

We consider the views as a universal property for databases: they are the possible obser- 
vations of the information contained in an instance-database, and we may use them in 
order to establish an equivalence relation between databases. Database category DB, 
which will be introduced in what follows, is at an instance level, i.e., any object in 
DB is an instance-database (i.e., a set of relations). The connection between a logical 
(schema) level and this computational category is based on the interpretation functors. 
Thus, each rule-based conjunctive query at schema level over a database A will be trans- 
lated (by an interpretation functor) in a morphism in DB, from an instance-database A 
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(a model of the database schema A) to the instance-database composed by all views of 
A. 

In what follows we will work with the typed operads, first developed for a purpose 
of homotopy theory 1161171181 . having a set R of types (each relation symbol is a 
type), or "R-operads" for short. The basic idea of an R-operad O is that, given types 
ri, ...,rfi,r € R, there is a set 0(ri, r^, r) of abstract k_ary "operations" with in- 
puts of type ri, ...jTfc and output of type r. We can visualize such an operation as a 
tree with only one node. In an operad, we can obtain new operations from old ones by 
composing them: it can be visualized in terms of trees (Fig. [T]) We can obtain the new 




r' 

an operation a composition of 

operations 



Fig. 1. Operations of an R-operad 



operators from old ones by permuting arguments, and there is a unary "identity" oper- 
ation of each type. Finally, we insist on a few plausible axioms: the identity operations 
act as identities for composition, permuting arguments is compatible with composition, 
and composition is associative. Thus, formally, we have the following: 

Definition 1. For any set R, an R-operad O consists of 

1. foranyri,...,rk,r € R, a set 0{ri, ...,rk,r) 

2. for any f € 0{ri, ...,rk,r) and any gi € 0(rii, riij , fi),..., gk € 0{rki, rki^,rk), 
an element f ■ {gi,...,gk) S 0(rii, , r^i, , r) 

3. for any r Cz O, an element 1^ € 0(r, r) 

4. for any permutation a G Rk, a map a : 0{ri, ...,rk,r) ©(r^-ji) , rcr(fc) , r), 
/ I — > fa, such that: 

(a) whenever both sides make sense, ft-iii), •■, -gk'ihki, hki^)) = 
(/• i9i,-,9k)) ■ (/lll,...,/^l^l,..,/^A;l,■•.,/^fe^J 

(b) for any f £ 0{ri, ...,rk,r), / = Ir • / = / ■ (Iri, Irk) 

(c) for any f e 0(ri, rfc, r), anda,ai G Rk, f{crcri) = {fa)cri 

(d) for any f G 0(ri, rfc, r), a £ Rk and gi G 0(rii , ri ),..., 

gk e 0{rku...,rkt,,rk), {fcr) ■ ((5<t(i), 3<T(fc)) = (/ ■ {gi, ■■, gk))p{<j) 
where p : Rk — > is the obvious homomorphism. 
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(e) foranyf £ 0(ri, rfc, r), gi G 0{rii, ...,rii^,ri),..,gk G 0{rki, ■■■,rki^,rk), 
andai S Ri^,....,ak G Rt^, (/-(gicri, ...fffccrfe)) = {f-{gi,..,gk))gi{ai,...,ak), 
where gi : Ri-^ x ... x i?^^ — > Ri-^^,,,^i^, is the obvious homomorphism. 

Let us define the "R-algebra" of an operad where its abstract operations are represented 
by actual functions (query-functions). For a given database schema with relation sym- 
bols ri,...,rk we consider / G 0{ri, ...,rk,r) as a conjunctive query r ri,...,rk 
that defines a view r. 

Definition 2. For any R-operad O, a R-algebra a consists of: 

1. for any r € R, a set a{r) is a set of tuples of this type ( relation ). a* is the extension 
of a to a list of symbols a*({ri, r^}) = {a(ri), a(rfc)}. 

2. foranyqG 0{ri, ...,rk,r) amappingfunctiona{q) : a{ri)x ...xa{rk) — > a{r), 
such that 

(a) whenever both sides make sense, a{q ■ (gi, .., qk)) = a{q){a{qi) x ... x a{qk)) 

(b) for any r £ R, a{lr) acts as an identity on a{r) 

(c) for any q G 0{ri, r^, r) anda permutation a G Rk, ci{qa) = a{q)<7, where 
a acts on the function a{q) on the right by permuting its arguments. 

3. we introduce the two functions, do and di, such that for any a{q), qG 0(ri, r^, r), 
we have that do{q) = {ri,...,rfc}, do{a{q)) = {a{ri), ...,a{rk)}, di{q) = {r}, 
and di{a{q)) = {a(r)}. 

Consequently, we can think of an operad as a simple sort of theory, used to define a 
schema mappings between databases, and its algebras as models of this theory used to 
define the mappings between instance-databases, where a mapping a is considered as 
an interpretation of relation symbols of a given database schema. 

2 Data Object Type for query-answering database systems 

We consider the views as a universal property for databases: they are the possible ob- 
servations of the information contained in an instance-database, and we can use them 
in order to establish an equivalence relation between databases. 

In a theory of algebraic specifications an Abstract Data Type (ADT) is specified by a set 
of operations (constructors) that determine how the values of the carrier set are built up, 
and by a set of formulae (in the simplest case, equations) stating which values should 
be identified. In the standard initial semantics, the defining equations impose a congru- 
ence on the initial algebra. Dually, a coagebraic specification of a class of systems, i.e.. 
Abstract Object Types (AOT), is characterized by a set of operations (destructors) that 
specify what can be observed out of a system-.sfafe (i.e., an element of the carrier), and 
how a state can be transformed to a successor-state. 

We start by introducing the class of coalgebras for database query-answering systems 
for a given instance-database (a set of relations) A. They are presented in an algebraic 
style, by providing a co-signature. In particular, the sorts include one single "hidden 
sort" corresponding to the carrier of the coalgebra, and other "visible" sorts, for inputs 
and outputs, that have a given fixed interpretation. Visible sorts will be interpreted as 
sets without any algebraic structure defined on them. For us, coalgebraic terms, built 
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only over destructors, are precisely interpreted as the basic observations that one can 
make on the states of a coalgebra. 

Input sorts are considered as a set of union of conjunctive queries ^(x) for a given 
database A, where x is a tuple of variables (attributes) of this query. Each query has an 
algebraic term of the "select-project-join + union" algebraic query language (SPJRU, or 
equivalent to it, SPCU algebra. Chapter 4.5, 5.4 in [?]) with a carrier equal to the set of 
relations in A. We define the power view-operator T, with domain and codomain equal 
to the set of all instance-databases, such that for any object (database) A, the object TA 
denotes a database composed by the set of all views of A. The object TA, for a given 
instance-database A, corresponds to the quotient-term algebra L^i/-, where the carrier 
is a set of equivalence classes of closed terms of a well-defined formulae of a relational 
algebra. Such formulae are "constructed" by Z'j^-constructors (relational operators in 
SPJRU algebra: select, project, join and union), by symbols (attributes of relations) of a 
database instance A, and by constants of attribute-domains. More precisely, TA is "gen- 
erated" by this quotient-term algebra L^i/-. For every object A holds that A C TA, 
and TA — TTA, i.e., each (element) view of database instance TA is also an element 
(view) of a database instance A. Notice that when A is also finitary (has a finite number 
of relations) but with at least one relation with infinite number of tuples, then TA has an 
infinite number of relations (views of A), thus can be an infinitary object. It is obvious 
that when a domain of constants of a database is finite then both A and TA are finitary 
objects. As default we assume that a domain of every database is an arbitrary large finite 
set. This is a reasonable assumption for real applications. 

Consequently, the output sort of this database AOT is a set TA of all resulting views 
(resulting n-ary relation) obtained by computation of queries q{x) over a database A. It 
is considered as the carrier of a coalgebra as well. 

Definition 3. A co-signature for a Database query-answering system, for a given instance- 
database A, is a triple T>e ~ {S, OP, [_ ]), where S are the sorts, OP are the operators, 
and [- ] is an interpretation of visible sorts, such that: 

1. S = (Xa,1^a, 3^), where Xa is a hidden sort (a set of states of a database A), La is 
an input sort ( set of union of conjunctive queries), and T is an output sort ( the set of all 
views of of all instance-databases). 

2. OP is a set of operations: a method Next : Xa x La — > Xa, that corresponds to 
an execution of a next query q{x) € La in a current state of a database A, such that a 
database A passes to the next state; and Out : Xa x La — > TA is an attribute that 
returns with the obtained view of a database for a given query q(x) G La- 

3. /_ 7 is a function, mapping each visible sort to a non-empty set. 

The Data Object Type for a query-answering system is given by a coalgebra: 
< XNext, XOut >: Xa -> X^^ x TA^"^ , of the polynomial endofunctor (_ )^-'^ x 

TA^^ : Set — ^ Set, where A denotes the lambda abstraction for functions of two vari- 
ables into functions of one variable (here denotes the set of all functions from Y to 
Z). 

This separation between the sorts and their interpretations is given in order to obtain 
a conceptual clarity: we will simply ignore it in the following by denoting both, a sort 
and the corresponding set, by the same symbol. In an object-oriented terminology, the 
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coalgebras are expressive enough in order to specify the parametric methods and the 
attributes for a database (conjunctive) query answering systems. In a transition sys- 
tem terminology, such coalgebras can model a deterministic, non-terminating, transi- 
tion system with inputs and outputs. In |fT9ll a complete equational calculus for such 
coalgebras of restricted class of polynomial functors has been defined. 
In the rest of this paper we will consider only the database query-answering systems 
without side effects: that is, the obtained results (views) will not be materialized as a 
new relation of this database A. Thus, when a database answers a query, it remains in 
the same initial state. Thus, the set Xa is a singleton {A} for a given database A, and 
consequently it is isomorphic to the terminal object 1 in the Set category. As a con- 

T, 

sequence, from 1 ~ 1, we obtain that a method Next is just an identity function 
id : 1 1. Consequently, the only interesting part of this AOT, is the attribute part 
Out : X La -> TA, with the fact that x = {A} x L^. 
Consequently, we obtain an attribute mapping Out : La — > TA, which will be used as 
a semantic foundation for a definition of database mappings: for any query qi{x) G La, 
the corespondent algebraic term qi is a function (it is not a T-coalgebra) : A'^ — >■ TA, 
where A'' is k-th cartesian product of A and rii,...,rik & A are the relations used 
for computation of this query. A view-mapping can be defined now as a T-coalgebra 
QAi ■ A TA, that, obviously, is not a function. We introduce also the two func- 
tions do, di such that do{qAi) = {rn, rik} and di{qAi) = {^i}, with obtained view 
ri = \\qi{x)\\ = qi{rii, ...,rik). Thus, we can formally introduce a theory for operads: 

Definition 4. View-mapping: For any query over a schema A we can define a schema 
map qi : A — > TA, where qi S 0{rii, ...,rik,ri), Q = {rii, ...,rik) ^ A and 

n G TA. 

A correspondent view-map at instance level is qA^ ~ {a{qi), q\^ : A — )■ TA, with 
A = a*{A), TA = a*{TA), do{qi_) — di{q±) = {J-}. For simplicity, in the rest of 
this paper we will drop the component q± of a view-map, and assume implicitly such a 
component; thus, do{qAi) — ct*{Q) ^ A and di{qAi) ~ ^ TA is a singleton 

with the unique element equal to view obtained by a " select-project-join+union " term 
%. 

3 Database category DB 

Based on an observational point of view for relational databases, we may introduce a 
category DB | 20| for instance-databases and view -based mappings between them, with 
the set of its objects Ohos, and the set of its morphisms Motdb, such that: 

1. Every object (denoted by A, B, C,..) of this category is a instance-database, com- 
posed by a set of n-ary relations ai G A, i = 1,2, ... called also "elements of A". 
We define a universal database instance T as the union of all database instances, 
i.e., T = {ai\ai G A,A £ OboB}- It is the top object of this category. 
A closed object in DB is a instance-database A such that A = TA. We have that 
T = TT, because every view v G TT is an instance-database as well, thus u e T. 
Vice versa, every element r g T is a view of T as well, thus r G TT. 
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Every object (instance-database) A has also an empty relation _L. The object com- 
posed by only this empty relation is denoted by J.*^ and we have that Tl.'^ = _L'^ = 
{_L}. Any empty database (a database with only empty relations) is isomorphic to 
this bottom object _L''. 

2. Morphisms of this category are all possible mappings between instance-databases 
based on views, as they will be defined by formalism of operads in what follows. 

In what follows, the objects in DB (i.e., instance-databases) will be called simply 
databases as well, when it is clear from the context. Each atomic mapping (morphism) 
in DB between two databases is generally composed of three components: the first 
correspond to conjunctive query qi over a source database that defines this view-based 
mapping, the second (optional) Wi "translate" the obtained tuples from domain of the 
source database (for example in Italian) into terms of domain of the target database (for 
example in English), and the last component Vj defines which contribution of this map- 
pings is given to the target relation, i.e., a kind of Global-or-Local-As-View (GLAV) 
mapping (sound, complete or exact). 

Instead of lists {gi, ...,gh) used for mappings in Definitions [1] |2] we will use the sets 
{gi, ...,gk} because a mapping between two databases does not depend on a particu- 
lar permutation of its components. Thus, we introduce an atomic morphism (mapping) 
between two databases as a set of simple view-mappings: 

Definition 5. Atomic morphism: Every schema mapping fsch ■ A — > B, based 
on a set of query-mappings qi, is defined for finite natural number N by 

fsch = {vi ■ Wi ■ qi\ qi <E 0{rii, ...,rik, r'l), Wi € 0{r'l,r[), Vi S 0{r[,ri), 

{nu...,nk}<zA, neB,i<i<N}. 

Its correspondent complete morphism at instance database level is 
f = a*{fsch) = {qAi= a{vi) ■ a{wi) ■ a{qi) \ Vi ■ ■ qi € fsch} : A~> B, where: 
Each a{qi) is a query computation, with obtained view a{r!j') g TAfor an instance- 
database A — a* (A) = {a(rfe) | € A}, and B = a*{B). 

Each a(wi) : Q;(r") — > ^(r^), where a{r[) € TB, is equal to the function determined 
by the symmetric domain relation tab ^ doniA x doms for the equivalent constants 
in a* (A) and a*{B) ({a, b) £ tab means that, a e doniA and h G doms represent 
the same entity of the real word (requested for a federated database environment) as: 
for any {ai, ...,an) € a{r") holds a{wi){ai, an) ~ [bi, ...,bn), and for all 1 < 
k < n (ak, bk) € tab- IffAB is not defined, it is assumed, by default, that a{wi) is 
an identity function. 

Let Pq. be a projection function on relations, for all attributes in di{a{qi)) = {a(r")}. 
Then, each a{vi) : a{r'^) — > a{ri) is one tuple-mapping function, used to distinguish 
sound, complete and exact assumptions on the views, as follows: 

1. inclusion case, when a{r'^) C Pq.[a{ri)). Thenfor any tuple t G ^(r^, a{vi){t) = 
ti, for some ti G ot{ri) such that Pq.({ti}) = t. 

We define \\qAi\\ ~ (^{^i) extension of data transmitted from an instance- 
database A into B by a component qAi- 
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2. inverse-inclusion case, when a{r^) D Pq.{a{ri)). 
Then, for any tuple t € oi{r[), 

1 empty tuple, otherwise 

We define \\qAi\\ — Pqii'^i'^'i)) the extension of data transmitted from an instance- 
database A into B by a component qa^- 

3. equal case, when both (a) and (b) are valid. 

Notice that the components a{vi),a{wi),a{qi) are not the morphisms in DB cat- 
egory: only their functional composition is an atomic morphism. Each atomic mor- 
phism is a complete morphism, that is, a set of view-mappings. Thus, each view-map 
QAi '■ A — > TA, which is an atomic morphism, is a complete morphism (the case 
when B = TA, tab is not defined, and a{vi) belongs to the "equal case"), and by 
c-arrow we denote the set of all complete morphisms. 

Example 1: In the Local-as-View (LAV) mappings 1211 . the inverse inclusion, inclu- 
sion and equal case correspond to the sound , complete and exact view respectively. In 
the Global-as-View (GAV) mappings, the inverse inclusion, inclusion and equal case 
correspond to the complete, sound and exact view respectively. 

□ 

Remark: In the rest of this paper we will consider only empty domain relations (i.e., 
when a{wi) are the identity functions) and we will write r G A also for a{r) £ a* {A), 
i.e., the name (type) of a relation r in ^ is used also for its extension (set of tuples of 
that relation), and A for a* (A) as well. Notice that the functions do and di are differ- 
ent from dam and cod functions used for the category arrows. Here do specifies exactly 
the subset of relations in a database A used for view-based mapping, while di defines 
the target relation in a database B for this mapping. Thus: do{f) C dom{f) = A, 
di{f) ^ cod{f) = B (in the case when / is a simple view-mapping then is a 

singleton). In fact, we have that they are functions do,di : MoruB — >■ ViT) (where V 
is the powerset operation), such that for any morphism f : A ^ B between databases 
A and B, we have that do{f) C A and di{f) C B. 

The Yes/No query qi over a database A, obviously do not transfer any information to 
target object TA. Thus, if the answer to such a query is Yes, then this query is repre- 
sented in DB category as a mapping qi : A TA, such that the source relations in 
do{qi) are non-empty and di{qi) = {-L}- The answer to such a query qi is No iff (if 
and only if) such a mapping does not exist in this DB category. 
□ 

We are ready now to give a formal definition for all morphisms in the category DB. 
Generally, a composed morphism h : A ^ C i^, a. general tree such that all its leaves 
are not in A: such a morphism is denominated as an incomplete (or partial) p-arrow. 

Definition 6. Sintax.- The following BNF defines the set of all morphisms in DB: 
p — arrow := c — arrow \ c — arrow o c — arrow (for any two c-arrows f : A — > B 
and g : B — > C ) 

morphism := p — arrow \ c — arrow o p ~ arrow (for any p-arrow f : A — > B 
and c-arrow g : B — > C ) 
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whereby the composition of two arrows, f (incomplete) and g (complete), we obtain 
the following p-arrow h ^ g o f : A — > C 

h = go f = y {qs, } o 

o y {qAAiree)} 

= {qB,o{qA,{tree) \ di{qA^ C d^iqe,)} \ to, € a*{gsch) & 9o(<7i3, ) H 0} 
= {qB,{tree) \ qe, € a*{gsck) & SoCto,) R ^ 0} 

where qA^ [tree] is the tree of the morphisms f below qAi- 

We have the equal analog diagrams of schema mappings as well: 

- For a morphism / : A — > B in DB we have syntactically identical schema 
mapping arrow fsch ■ -4 — > B without the interpretation of its symbols (the 
composition of functions " o " is replaced by the associative composition of operads 

- A schema mapping graph G is any subset of schema arrows. 

fx gT 



qi qAi 



qA2 



qA3 



qA4 



C2 



q± qsi qB2 



C3 



qB3 



1 (aia2)(a2a3) 34 (3435) I(bib4)(b2b3)(b4b5) 
hx = gT ° fr 




-L (ai,a2) 



qA2^/^A3 

(^2,33) a4 



C 



gT 



(a^as) A 



Fig. 2. composed tree 

Notice that the arrows (morphisms) in DB are not functions. Thus, DB is different 
from Set category. In order to explain the composition of morphisms let us consider 
the following example: 

Example 2: Let us consider the morphisms / : A — > B, g : B — > C, such that 
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A = {ai, ..,ae}, B = {61, ..,67}, C = {ci, ..,04}, where / = {g^i , ••, g^i}, 
9o{qAi) = {01,02}, do{qA2) = {02,03}, 90(9^3) = {"4}, daiQAi) = {04,05}, 
9i{qAj = {^^i}, di{qA2) = {^2}, diiqA:,) = {63}, diiqAi) = {^^e}, and 5 = 
{qsi, --jtos}, 

with dniqBx) = {bi.bi}, d^iqE^) = {^2,63}, daiqBa) = {h^bs}, = 
{ci}, = {C2}, diiqss) = {C3}, that can be represented by trees Jt ^ f 

and gr = 9 and their sequential composition /i^ (Fig.|2]i. 

The composition of morphisms (Fig.|3]l h ~ g o f : A — > C may be represented as a 
part of the tree hx that gives information contribution from the object A (source) into 
the object C (target of this composed morphism). We have that do{f) — {oi, 02, 03, 04, 05}, 
di{f) = {bi,b2,b3,be}, do{g) = {61, ^2, &3, &4, ^5}, di{g) = {ci, 02,03}, while 
do{h) = dn{gof) = {01,02,03,04} ^ 5o(/), (9i(/i) = <9i(go/) = {ci,C2} 7^ (9i(.g) 
Let us see, for examole. the comoosition of the c-arrow h : C — > D with the com- 

h = g= f 

1 

1=1 qi 
1 

1=2 qi 
1 



complete partial 




Fig. 3. obtained partial morphism 

posed arrow go / in the previous example, where D = {di, ..^d/C)},h = {qcn qc2 , qc'a } , 

^o(<7Ci) = {c2},9i(gci) = {di}, 9o((7c2) = {ci, C2, C3}, 91(902) = {(^2}, ^olgca) = 
{ci,C4}, diiqc^) = {da}, with qs^itree) = gs^ o {q^^, 9^3} a complete, and 
5Bj {tree) = qsi ° {qAi , ^ } a partial (incomplete) component of this tree, as repre- 
sented in the Fig. |4] 

As we see, a composition of (complete) morphisms generally produces a partial (in- 
complete) morphism (only a part of the tree hx represents a real contribution from A 
into C) with hidden elements (in the diagram of the composed morphism h, the element 
64 is a hidden element). In such a representation we "forgot" parts of the tree gx ° fx 
that are not involved in real information contribution of composed mappings from the 
source into the target object. So, we define the semantics of any morphism h : A — > C 
as an "information transmitted flux" from the source into the target object. An "infor- 
mation flux" (denoted by h) is a set set of views (so, it is an object in DB category as 
well) which is "transmitted" by a mapping. 
□ 

In order to explain this concept of "information flux" let us consider a simple mor- 
phism / : A — > B from a database A into a database B, composed by only one view 
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k «g« f 



1 



1 di 
qci 
C: 



1 



qB2| 
(bzbs) 



qA^\QA3 

1 (a2 as) a4 
\ . I 

Y 

complete 



d2 



qc2 



( ci 
qei 
(bi 
qAi/\ 

V 



b4) 



C2 ,C3) 
C^2 

(b2,t3) 



d3 

qc3 

(Cl , C4) 

qei 

(biS) 



qA2/\qA3 qA/\ 
(ai aa) a4 (ai ai) 



partial 



Fig. 4. composed morphism 

map based on a single query (j(x) — i?„(u„), where n > 0, Ri are rela- 

tion names (at least one) in A or built-in predicates (ex. <, etc.), and q is a relation 
name not in A. Then, for any tuple c for which the body of this query is true, also ^(c) 
must be true, that is, this tuple from a database A "is transmitted" by this view-mapping 
into one relation of database B. The set (n-ary relation) Q of all tuples that satisfy the 
body of this query will constitute the whole information "transmitted" by this mapping. 
The "information flux" / of this mapping is the set TQ, that is, the set of all views 
(possible observations) that can be obtained from the transmitted information of this 
mapping. 

Definition 7. We define the semantics of mappings by function Bt '■ Morjjs — > 
Oho B, which, given any mapping morphism f : A — > B , returns with the set of 
views ("information flux") that are really "transmitted" from the source to the target 
object. 

1. For an atomic morphism, f = Bxif) — r{||(7/ij| | q^. e /}. 

2. Let g : A ^ B be a morphism with a flux g, and f : B ^ C an atomic morphism 
with flux f deflned in point 1, then f o g = Brif ° g) = / fl 5- 

Thus we have the following fundamental property: 

Proposition 1 Any mapping morphism f : A — > B is a closed object in DB, i.e., 
f^Tf. 

Proof: This proposition may be proved by structural induction; each atomic arrow is a 
closed object {Tf = T{T{\\qA,\\ \ Qa^ G /}) = T{\\qAM \ Qa, G /} = f, each arrow 
is a composition of a number of complete arrows, and intersection of closed objects is 
always a closed object. 

□ 

Remark: The "information flux" / of a given morphism (mapping) / : A — > B is an 
instance-database as well (its elements are the views defined by the formulae above), 
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thus, an object in DB: the minimal "information flux" is equal to the bottom object _L° 
so that, given any two database instances A, B in DB, there exists at least an arrow 
(morphism) between them / : A — > B such that / = X". 

Proposition 2 The following properties for morphisms are valid: 

1. each arrow f : A ^ B, such that f = TB is an epimorphism 

2. each arrow f : A ^ B, such that f = TA is a monomorphism 

3. each monic and epic arrow is an isomorphism, thus two objects A and B are iso- 
morphic iff TA = TB, i.e., 

A^B iff TA = TB 

Proof: 1. An arrow f : A ^ B is epic iff for any h,g : B — > C holds {h o f = 

g o f) ^ {h ^ g), thus {ho f = g o f) ^ [h = g) which is satisfied by / = TB 
(because h CTB andg C TB) 

2. An arrow / : A — > B is monic iff for any h,g : C — > A holds {f ° h = f o g) ^ 
[h ~ g), thus ifoh~fog) [h = g) which is satisfied by / = TA (because 
h CTAandg C TA) 

3. By 1 and 2, because an isomorphism is epic and monic, and viceversa if / is monic 
and epic then f = TA (2) and J ^ TB (1), thus TA = TB. It is enough to show 
the isomorphism A )^ TA : let us define the isomorphisms is a ■ A — > TA, and its 
inverse is^^ : TA — > A, 

= U {lAi}, is^^ = U {qTAi} 

9l{qAi) = {v} & veTA do[qTAi)=dl{qTAi) & dl{qTAi) = {v} Sz «GA 

Thus, ISA ~ is^A ~ T A, so it holds that is^^ o is^ = TA = idA = isA o is^^, i.e., 
is^^oisA = idA andisAois^^ ~ idrA, thus A ici TA. Finally, A ^ TA = TB ^ B, 
i.e., A-^B. 
□ 

Remark: Thus, we consider, for example, the real object (empty database instance) J.*^ 
as zero object (both terminal and initial) in DB, (from any real object A in DB there is 
a unique arrow from it into _L'' and its reversed arrow). Each arrow / with 9o (/) = {-L} 
or di{f) = _L has an empty flux, thus does not give any information contribution to the 
target database: as for example Yes arrows in DB for Yes/No queries. 
It is easy to verify that each empty database (with all empty relations) is isomorphic to 
the zero object _L". 

In what follows we will show that any two isomorphic objects (databases) in DB are 
observationally equivalent. 

3.1 Interpretations of schema mappings 

The semantics of mapping between two relational database schemas, / : A — > B, is 
a constraint on the pairs of interpretations, of A and B, and therefore specifies which 
pairs of interpretations can co-exist, given the mapping (see also Q). We consider only 
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view-based mappings between schemas defined in the SQL language of SPJRU alge- 
bra, i.e., when 

(1) / = {qAi{x) bj{x)}, where qAi{x) is a union of conjunctive queries over A 
and bj is a relation symbol of a database schema B, or, 

(2) / ~ {qAi{x) => QBjix)}, where qBj{x) is a union of conjunctive queries over 
B. In this case the mapping / also involves a helper database schema C with a rela- 
tion Ci{x) for each qAi{x) G f with two new database mappings, /ac ■ A C and 
Jbc -B^C, with Jac = {qAi{x) c,(x)} and fee = {qsjix) => Ci{x)]. 

The formula e — qAi{x) qBj{x) (logical implication between queries), means that 
each tuple of the view obtained by the query qAi{x) is also a tuple of the view obtained 
by the query qBj{x). 

There is a fundamental functorial interpretation connection from schema mappings and 
their models in the instance level category DB: based on the Lawvere categorial theo- 
ries II22I23I . where he introduced a way of describing algebraic structures using cate- 
gories for theories, functors (into base category Set, which we will substitute by more 
adequate category DB), and natural transformations for morphisms between models. 
For example, Lawvere's seminal observation that the theory of groups is a category with 
group object, that group in Set is a product preserving functor, and that a morphism of 
groups is a natural transformation of functors, is an original new idea that was succes- 
sively extended in order to define the categorial semantics for different algebraic and 
logic theories. This work is based on the theory of sketches, which are fundamentally 
graphs enriched by other concepts such as (co)cones mapped by functors in (co)limits 
of the base category Set. It was demonstrated that, for every sentence in basic logic, 
there is a sketch with the same category of models, and vice versa |l24l . Accordingly, 
sketches are called graph-based logic and provide very clear and intuitive specification 
of computational data and activities. For any small sketch E the category of models 
Mod{E) is an accessible category by Lair's theorem and reflexive subcategory of Set^ 
by Ehresmann-Kennison theorem. In what follows we will substitute the base category 
Set by this new database category DB. 

Proposition 3 Let Sch{G) be a schema category generated from a schema mapping 
graph ( sketch ) G . Every interpretation R-algebra a has as its categorial correspondent 
the functor (categorial model) a* : Sch{G) — > DB , defined as follows: 

L for any database schema A — {ai, ...,a„}, (object in Sch(G)), where ai G R, i = 
1, .., n, holds A — a*(^A) ~ {a{ai), a(a„)}, i.e., A is an interpretation (logical 
model) of a database schema A. 

2. for any schema mapping arrow f : A — > B, let fx be the tree structure ofoper- 
ads, fx = {/i • 5i, fk • dk)}, where each fi is a linear composition of operads, 
then a* if) {a(/i) o a*(.gi), ...,a{fk) o a*{gk)}, otherwise a* {f) = ^(/t)- 
Formally, the satisfaction of mapping f is defined as follows: for each logical for- 
mula e S /, {a* (A) , a* (B)} N e, that is e is satisfied by a model 
a* e Mod{Sch{G)) C DB^<'^(^\ 

Proof: This is easy to verify, based on general theory for sketches ll23]| : each arrow in a 
sketch (enriched schema mapping graph) G may be converted into a tree syntax struc- 
ture of some morphism in DB (labeled tree without any interpretation), thus, a sketch 
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G can be extended into a category Sch{G). (The composition of schema mappings in 
the category Sch{G), where each mapping is a set of first-order logical formulas, can 
be defined as a disjoint union). The functor is only the simple extension of the interpre- 
tation R- algebra function a for a lists of symbols, as in Definition |5] 
□ 

3.2 Power-view endofunctor T 

Let us extend the notion of the type operator T into a notion of the endofunctor in DB 
category: 

Theorem 1 There exists an endofunctor T = {T^ , T^) : DB — > DB, such that 

1. for any object A, the object component T° is equal to the type operator T, i.e., 

T°{A)^TA 

2. for any morphism f : A — > B, the arrow component is defined by 

T{f) = T\f) = U {qTA, : TA ^ TB} 

do{qTAi)=di{qTAi) = {v} Sz v£ f 

3. Endofunctor T preserves the properties of arrows, i.e., if a morphism f has a 
property P (monic, epic, isomorphic), then also T{ f) has the same property: let 
P P nnd 

^ mono-; ^ epi ^'i^^ 

Piso are monomorphic, epimorphic and isomorphic properties respectively, then 
the following formula is true 

V(/ e MorDB){Praono{f) = Pmono{T f) and Pept{f) = Pepi{T f) and Piso{f) = 

Proof: It is easy to verify that T is a 2-endofunctor and to see that T preserves proper- 
ties of arrows: for example, if Pmono{f) is true for an arrow / : A — > B, then / = TA 
and Tf = t7 = T{TA) = TA, thus P,nono{Tf) is true. Viceversa, if Pmono{Tf) is 
true then Tf = Tf = T{TA), i.e., f = TA and, consequently, Pmonoif) is true. 
□ 

The endofunctor T is a right and left adjoint to identity functor Idb, i e., T ~ Idb- 
Thus we have the equivalence adjunction < T, InB,v'^\v > with the unit rip : T ^ 
Idb (such that for any object A the arrow rf^ = r]'-'{A) = is^^ : TA — > A), and the 
counit r] : Idb — T (such that for any A the arrow r]A = = is a '■ A — TA) are 
isomorphic arrows in DB (by duality theorem it holds that 77*-^ = 77™"). 
The function T^ : [A — > B) — > {TA — > TB) is not a higher-order function (ar- 
rows in DB are not functions): thus, there is no correspondent monad-comprehension 
for the monad T, which invalidates the thesis [Esl that "monads = monad-comprehensions". 
It is only valid that "monad-comprehensions monads". 

We have already seen that the views of a database can be seen as its observable compu- 
tations: what we need, to obtain an expressive power of computations in the category 
DB, are the categorial computational properties, as known, based on monads: 
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Proposition 4 The power-view closure 2-endofunctorT = (r°,r^) : DB — > DB 
defines the monad (T, 77, /i) and the comonad (T, ij''^ , fi^) in DB, such that rj : Iob — 
T and 77*^ : T i2 I^b are natural isomorphisms, while ji : TT — > T and jjP : 
T — > TT are equal to the natural identity transformation idr ■ T — > T (because T 
= TT). 

Proof: It is easy to verify that all commutative diagrams of the monad [pA ° A'ta = 
PA ° T^A , l^A ° VTA = idr A ~ l^A° Ti]a) and the comonad are diagrams composed 
by identity arrows. Notice that by duality we obtain i^ta ~ Ti]a — IJ-^a^ ■ 
□ 



3.3 Duality 

The following duality theorem tells us that, for any commutative diagram in DB, there 
is the same commutative diagram composed by equal objects and by inverted equiva- 
lent arrows as well. This "bidirectional" mappings property of DB is a consequence 
of the fact that a composition of arrows is semantically based on the set-intersection 
commutativity property for "information fluxes" of its arrows. Thus any limit diagram 
in DB also has its "reversed" equivalent colimit diagram with equal objects, and any 
universal property also has its equivalent couniversal property in DB. 

Theorem 2 there exists the controvariant functor S_ = {S^\S}) : DB — > DB such 
that 

1. is an identity function on objects. 

2. for any arrow in DB, f : A — > B we have S} (/) : B — > A, such that S} (/) = 

where is an (equivalent) reversed morphism of f (i.e., = f), 
= is^^ o (TfY^^ oisB with 

{Tfy^'" 4 y {grs, : TB ^ TA} 

9o{qTBj}=di{qTBj)={v} & ve f 

3. The category DB is equal to its dual category DB'^^. 

Proof: We have, from the definition of reversed arrow, that, /™^ = is^^ f] (j^j^ymv ^isg 

= TAf]{Tfy^^C\TB = TAC\ffC\TB = TAf]fC\TB = /. The reversed ar- 
row of any identity arrow is equal to it, and, also, the compositional property for functor 
holds (the intersection operator for "information fluxes" is commutative). Thus, the con- 
trovariant functor is well defined. 

It is convenient to represent this controvariant functor as a covariant functor S : DB'-'^ 
— > DB, or a covariant functor 5*^^ : DB — > DB^^ . It is easy to verify that for 
compositions of these covariant functors hold, SS'-'^ = Idb and 5**^^5 = Idbop 
w.rt. the adjunction < 5*, 5**^^, >: DB'^^ — DB, where is a bijection; for each 
pair of objects A, B in DB we have the bijection of hom-sets, 4)a.b '■ DB{A, S{B)) ~ 
DB0P{S'^P{A),B), i.e., : DB{A,B) ~ DB{B,A), such that for any arrow 

/ € DB{A, B) holds 0A,i3(/) = S^{f) = The unit and counit of this adjunc- 
tion are the identity natural transformations, r/op : Idb — > SS^ , £op ■ 
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IjjB'^p respectively, such that for any object A they return by its identity arrow. Thus, 
from this adjunction, we obtain that DB is isomorphic to its dual DB'-'^; moreover 
they are equal because they have the same objects and the same arrows. 

□ 

Let us introduce the concepts for products and coproducts in DB category. 

Definition 8. The disjoint union of any two instance-databases (objects) A and B, de- 
noted by A + B, corresponds to two mutually isolated databases, where two database 
management systems are completely disjoint, so that it is impossible to compute the 
queries with the relations from both databases. 
The disjoint property for mappings is represented by facts that 
doif + 9) " do{f) + 9o(5), + 5) = + d,{g)- 

Thus, for any database A, the replication of this database (over different DB servers) 
can be denoted by the coproduct object A + Am this category DB. 

Proposition 5 For any two databases (objects) A and B we have that T(v4 + B) = 
TA + TB. Consequently A + Ais not isomorphic to A. 

Proof: We have that T{A + B) ^TA + TB, direcdy from the fact that we are able to 
define views only over relations in A or, alternatively, over relations in B. Analogously 
f + g = 7+ g, which is a closed object, that is, holds that T{f + g) = T{f + g) = 
Tf + Tg = f + g = fTg. 

From T(A + A) = TA + TA^TA we obtain that A + Ais not isomorphic to A. 

□ 

Notice that for coproducts holds that (7+ _L" = _L'^ +C ~ C, and for any arrow / 
in DB, /+ -L^ w +/ « /, where is a banal empty morphism between 

objects, such that do{±^) = di{±^) =±°, with JJ =±°. 

We are ready now to introduce the duality property between coproducts and products in 
this DB category: 

Proposition 6 There exists an idempotent coproduct bifunctor + : DB x DB — > DB 
which is a disjoint union operator for objects and arrows in DB. 

The category DB is cocartesian with initial (zero) object AJ^ and for every pair of 
objects A,B it has a categorial coproduct A + B with monomorphisms (injections) 
inA : A^ A + B and ins : B A + B. 

By duality property we have that DB is also cartesian category with a zero object iS^. 
For each pair of objects A,B there exists a categorial product Ax B with epimorphisms 
(projections) pa = in™^ : A x A A and pb = : B x B B, where the 

product bifunctor is equal to the coproduct bifunctor, i.e., X = +. 

Proof: 1. For any identity arrow (idAjidB) in DB x DB, where idA, idb are the 

identity arrows of A and B respectively, holds that idA + idB ~ idA + = TA + 

TB = T{A + B) = idA+B- Thus, +^{idA, idB) ^ idA + idB ~ idA+B, is an identity 
arrow of the object A + B. 

2. For any given k : A — > Ai, ki : Ai — > A2, I : B — > Bi, li : Bi — > B2, holds 
+i(fci,iO^+i(fc,0 = +ilfcrr'i)n+HM) = o'k+liol^+^kC^hol) 
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= fc) ° (^1, 0)^ thus +1 (fci, /i) o +i(fc, = fc) o /)). 

3. Let us demonstrate the coproduct property of this bifunctor: for any two arrows / : 

A — > C, g : B — > C, there exists a unique arrow k : A + B — > C, such that 

f = k o in A, g = k o ins, where iuA : A ^ A + B, ins '■ B A + B are the 

injection (point to point) monomorphisms (in^ — TA, ins = TB). 

It is easy to verify that for any two arrows / : A — > C, g : B — > C, there is exactly 

one arrow k — ec o {f + g) ■ A + B — > C, where : C + C -» C is an epimorphism 

(with = TC), such that k = f + g. 

□ 

The following proposition introduces the pullbacks (and pushouts, by duality) for the 
category DB. 

Proposition 7 For any given pair of arrows with the same codomain, f : A — > C and 
g : B — > C, there is a pullback with the fibred product D — f f^g (product of A and 
B over C). By duality, for any pair of arrows with the same domain there is a pushout 
as well. 

DB is a complete and cocomplete category. 



Proof: We define the commutative diagram f o Iia 
hs ■ D ^ B are monomorphisms defined by Ha = 



= fiB o g, where Iia ■ D ^ A and 



IS J o moB, 



where is , 



TA 



A, 



TB — > B are isomorphisms and inuA '■ D ^ TA, 



iriDB '■ D ^ TB are monomorphisms, such that Iia = Iib = f C\g = D. 

Let us show that for any pair of arrows Ia ■ E — > A, Ib '■ E — > B, such that 

f o Ia = Ib ° g there is a unique arrow k : E — > D such that a pullback diagram 




commutes, i.e., (a) thatJ^ =JiA ° k and Ib = hB o k. In fact, it must hold k C TD = 
T{ff]g) = f C\9 ~ hA = hB- So, from the commutativity (a), Ia = hAC\k = k and 
Ib = hB C\k ~ k. Thus , for any other arrow ki : E — > D that makes a commutativity 
(a) must hold that ki ~ Ia = Ib and, consequently, fci = k, i.e., ki = k. 
Consequently, DB is a cartesian category with a terminal object and pullbacks, thus 
it is complete (has all limits). By duality we deduce that it is also cocomplete (has all 
colimits). 
□ 

In order to explain these concepts in another way, we can see the limits and colimits as 
a left and a right adjunction for the diagonal functor A : DB — > DB ' for any small 
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index category (i.e., a diagram) J. For any colimit functor F : DB^ — > DB we have a 
left adjunction to diagonal functor < F, A^ijCi^c >■ DB'^ — > DB, with the colimit 
object F{D) for any object (diagram) D € DB"^ and the universal cone, a natural 
transformation, rjc ■ Id]jB ' — > Ai^. Then, by duality, the same functor F is also a 
right adjoint to the diagonal functor (adjunction, < A,F,ri,e > : DB — > DB^), with 
the limit object (equal to the colimit object above) F{D) and the universal cone (counit), 
a natural transformation, e : AF — > Id]jg.j, such that e — rj™^ and r/ = e™". 
Let us see, for example, the coproducts {F ~ +) and products (F = x = +). In that 
case the diagram D G DB'^ is just a diagram of two arrows with the same codomain. 
We obtain for the universal cone unit ?yc(< A,B >) :< A,B > — ^< A + B,A + B > 
one pair of coproduct inclusion-monomorphisms 7yc'(< A,B >) =< inA,inB >, 
where iuA : A ^ A + B, ins : B ^ A + B. The universal cone counit of product 
e(< Ax B, Ax B >) :< Ax B, Ax B > — ^< A,B>is a pair of product projection- 
epimorphisms £(< A x B, A x B >) =< pa,Pb >, where pA '■ A x B ^ A, 
Pb : a X B ^ B, a X B ^ a + B, pa = in^A ", pB = in^§^ , as represented in the 
following diagram; 




Example 3: Let us verify that each object in DB is a limit of some equalizer and a 
colimit of its dual coequalizer In fact, for any object A, a "structure map" h : TA — > 
A of a monadic T-algebra < A,h > derived from a monad (T, 77, ^) (where h o rjA = 
idA, so that h is an isomorphism h = rf™ = r/^, i.e., h = TA = idA) we obtain the 
absolute coequalizer (by Back's theorem, it is preserved by the endofunctor T, i.e., T 
creates a coequalizer) with a colimit A, and, by duality, we obtain the absolute equalizer 
with the limit A as well. 




□ 
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4 Equivalence relations for databases 



We can introduce a number of different equivalence relations for instance-databases: 

- Identity relation: Two instance-databases (sets of relations) A and B are identical 
when holds the set identity A ~ B. 

- behavioral equivalence relation: Two instance-databases A and B are behaviorally 
equivalent when each view obtained from a database A can also be obtained from 
a database B and viceversa. 

- weak observational equivalence relation: Two instance-databases A and B are 
weakly equivalent when each "certain" view (without Skolem constants) obtained 
from a database A can be also obtained from a database B and viceversa. 

It is also possible to define other kinds of equivalences for databases. In the rest of this 
chapter we will consider only the second and third equivalences defined above. 

4.1 The (strong) behavioral equivalence for databases 

Let us now consider the problem of how to define equivalent (categorically isomorphic) 
objects (database instances) from a behavioral point of view based on observations: as 
we see, each arrow (morphism) is composed by a number of "queries" (view-maps), 
and each query may be seen as an observation over some database instance (object of 
DB). Thus, we can characterize each object in DB (a database instance) by its behavior 
according to a given set of observations. Indeed, if one object A is considered as a black- 
box, the object TA is only the set of all observations on A. So, given two objects A and 
B, we are able to define the relation of equivalence between them based on the notion 
of the bisimulation relation. If the observations (resulting views of queries) of A and 
B are always equal, independent of their particular internal structure, then they look 
equivalent to an observer. 

In fact, any database can be seen as a system with a number of internal states that can be 
observed by using query operators (i.e, programs without side-effects). Thus, databases 
A and B are equivalent (bisimilar) if they have the same set of observations, i.e. when 
TA is equal to TB: 

Definition 9. The relation of (strong) behavioral equivalence ' «' between objects 
(databases) in DB is defined by 

A^BiffTA = TB 

the equivalence relation for morphisms is given by, 

f~g iff 1=9 

This relation of behavioral equivalence between objects corresponds to the notion of 
isomorphism in the category DB (see Proposition|2]i. 

This introduced equivalence relation for arrows w, may be given by an (interpretation) 
function Bt MoroB — > OboB (see Definition|7]), such that w is equal to the kernel 
of Bt,{~ = fcerSy), i-e-, this is a fundamental concept for categorial symmetry ll26l : 
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Definition 10. Categorial symmetry; 

Let C be a category with an equivalence relation « C Marc x Marc for its 
arrows ( equivalence relation for objects is the isomorphism ±2 C Obc x Obc ) 
such that there exists a bijection between equivalence classes of ~ and so that it is 
possible to define a skeletal category \C\ whose objects are defined by the imagine of 
a function Bt : Marc — > Obc with the kernel kerBx = «, and to define an 
associative composition operator for objects *, for any fitted pair g o f of arrows, by 
Brig) * BtU) = Brig o f). 

For any arrow in C, f : A — > B, the object Bt (/) in C, denoted by f, is denominated 
as a conceptualized object. 

Remark: This symmetry property allows us to consider all the properties of an arrow 
(up to the equivalence) as properties of objects and their composition as well. Notice 
that any two arrows are equal if and only if they are equivalent and have the same source 
and the target objects. 

We have that in symmetric categories holds that / w g iff / ~ 5. 
Let us introduce, for a category C and its arrow category C | C, an encapsulation 
operator J : Marc — > Obcic^ that is, a one-to-one function such that for any arrow 
/ : A — >■ B, J{f) =< A, B, f > is its correspondent object in C | C, with its inverse 
ip such that '0(< A, B,f>) = f. 

We denote by Fgt , Snd ■ {C i C) — > C the first and the second comma functorial 
projections (for any functor F : C ^ D between categories C and D, we denote 
by F^ and F^ its object and arrow component), such that for any arrow {ki; :< 
A, B, f >^< A', B' , g > in C I C (such that k2 o f = g o ki in C), we have that 

FO (< A, BJ>)^ A, (fci; k^) = fci and S"U< A, B J >) = B, SUh; k^) = 

k2. 

We denote by A : C — > (C \. C) the diagonal functor, such that for any object A in a 
category C, A°(A) =< A, A, idA >■ 

An important subset of symmetric categories are Conceptually Closed and Extended 
symmetric categories, as follows: 

Definition 11. Conceptually closed category is a symmetric category C with a functor 
Te = {T^, T^) -.{C iC) — >C such that T° = Brip, i.e., Bt = T°J, with a natural 
isomorphism tp :TeO k i2 Ic, where Ic is an identity functor for C. 
C is an extended symmetric category if holds also t^^ • t ~ ip, for vertical compo- 
sition of natural transformations r : Fst — > Te and : Te — > Snd- 

Remark: it is easy to verify that in conceptually closed categories, it holds that any ar- 
row / is equivalent to an identity arrow, that is, / w id- 
It is easy to verify also that in extended symmetric categories the following holds: 
r = (ri(r,F°;^)) . (^"^i^O), = {^-'S"^,) . (T.HV-; r,5°,)), 
where t/ : Ic — > Ic is an identity natural transformation (for any object A in C, 

Tl{A) = idA). 

Exampie 4: The Set is an extended symmetric category: given any function / : A — > 
B , the conceptualized object of this function is the graph of this function (which is a 

sst)J^BTif) = {ixJ{x))\xeA}. 
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The equivalence w on morphisms (arrows) is defined by: two arrows / and g are equiv- 
alent, / « g, iff they have the same graph. 

The composition of objects * is defined as associative composition of binary relations 
(graphs), Brigof) = {(.t, {gof){x)) \x & A} ^ {{v.giv)) I V e B}o{[x, f{x)) \ x G 
A}^BT{g)*BTif). 

Set is also conceptually closed by the functor T^, such that for any object J{f) =< 
A,BJ >, T0(J(/)) = Brif) = \ x e A}, and for any arrow {h^k^) : 

Jif) ^ J{9)y component is defined by: 

for any {xj{x)) e T0(J(/)), Tl{kuk2){xJ{x)) = (fci(x), fc2(/(x))). 
It is easy to verify the compositional property for T^, and that (id^; ids) = ^c'to (,/(/))• 
For example. Set is also an extended symmetric category, such that for any object 
J(/) =< A,B,f > in Set | Set, we have that T{J{f)) : A -Bt(/) is an epi- 
morphism, such that for any x ^ A, T{J{f)){x) ~ {x,f{x)), while T^^{J{f)) : 
Brif) ^ i? is a monomorphism such that for any (x, f{x)) £ Brif), 
r-i(J(/))(x,/(x))=/(x). 

Thus, each arrow in Set is a composition of an epimorphism and a monomorphism. 
□ 

Now we are ready to present a formal definition for the DB category: 

Theorem 3 The category DB is an extended symmetric category, closed by the functor 
Tf, — (T^^T^) : (C 4, C) — > C, where = Bt^^ '■^ th^ object component of this 
functor such that for any arrow f in DB, T^[J[f)) ~ f, while its arrow component 
Tl is defined as follows: for any arrow (hi, ^,2) '■ J{f) — J{fl) '« DB ^ DB, such 
that g o hi = h2 o f in DB, holds 

Tl{hi-h2)= U 

do{qj)=di{qj) = {v} & dG h^f 

The associative composition operator for objects *, defined for any fitted pair g o f of 
arrows, is the set intersection operator p|. 

Thus, Brig) * Brif) ^g^J=i^f^ Brig o /). 

Proof: Each object A has its identity (point-to-point) morphism idA = 

U9o(9A.)=ai(qA. )={!'} & ^£^{9^.} and holds the associativity h o [g o f) f] (g o f) 

= C\ 9 C\ f ~ 9) n f ~ 9) ° f ■ They have the same source and target ob- 
ject, thus ho [go f) — [ho g)o f . Thus, DB is a category. It is easy to verify that also 
Te is a well defined functor In fact, for any identity arrow {idA]idB) ■ J{f) — > 
J(f) it holds that T}{idA]idB) = Ua / ^ a / ^ r 1 t r- — A17} = idris 

' e\ -D/ ^aa{qj)=di(qj) = {v] k. ve idBof ''^Si' f 

the identity arrow of /. For any two arrows (/ii;/i2) : J{f) — ^ ^(ff)' {h',h) '■ 
J{g) J{k), it holds Jhat T}{hi;h2) o mirM) = T[(h^h2) ^Tf^l2) =^ 

Tih^g) n T{h^f) = i2[\9r\ ^2 n ./ = (m^^./ = 9ohi) = ^2 n n /^i n n / = 

(by h o f = g o hi) = ^2 fl ^2 fl / = '2 o /12 o / = T^ih o hi;l2 o /12), finally, 
Tl{hi] /12) ° T^ili] I2) = Tl{li ohi;l2 /12). For any identity arrow, it holds that idA, 
T^J{idA) = idA = TA ~ A as well, thus, an isomorphism ip : Te o A ±2 Idb is 
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valid. 

□ 

Remark: It is easy to verify (from • t = ip) that for any given morphism 

/ : A — > B in DB, the arrow /ep = T{J{f)) : A ^ f is an epimorphism, and 
the arrow ~ T^^iJif)) ■ f ^ B is a monomorphism, so that any morphism fin 
DB is a composition of an epimorphism and monomorphism f = fin o fep, with the 
intermediate object equal to its "information flux" /, and with / w fin ~ fep- 
Let us prove that the equivalence relations on objects and morphisms are based on the 
"inclusion" Partial Order (PO) relations, which define the DB as a 2-category: 

Propositions The subcategory DBj C DB , with OboBj = OhoB cindwithmonomor- 
phic arrows only, is a Partial Order category with the PO relation of "inclusion " 
A ^ B defined by a monomorphism f : A ^ B. The "inclusion " PO relations for 
objects and arrows are defined as follows: 

A<B iff TA<Z TB 

f<9 iff 1^9 i^-e.Jcg) 
they determine two observation equivalences, i.e., 

A^B {i.e., AkB) iff A<B and B<A 

f ~ 9 iff / ^ 9 and g < f 

The power-view endofunctor T : DB — 5- DB is a 2-endofunctor and a closure op- 
erator for this PO relation: any object A such that A = TA will be called a "closed 
object ". 

DB is a 2-category, 1 -cells are its ordinary morphisms, while 2-cells (denoted by ) 
are the arrows between ordinary morphisms: for any two morphisms f,g:A — > B , 
such that f ^ 9 , a 2-cell arrow is the "inclusion " ^/a : f-~9. Such a 2-cell arrow is 
represented by an ordinary arrow in DB, a : f ^ g, where a ~ T],(idA \ ids)- 

Proof: The relation A ^ B is well defined: any monomorphism f : A ^ B isa unique 
monomorphism (for any other monic arrow g : A ^ B must hold g — TA = /, 
thus g = /). Consequently, between any two given objects in DBj there can exist at 
maximum one arrow, so this is a PO category. The "inclusion" A ^ B is not a simple 
set inclusion C between elements of A and elements of B (this is the case only for 
closed objects and, generally, A C B implies A ^ B, but not viceversa). The following 
properties are valid: 

1. A B implies TA ^ TB {i.e.,TA C TB), from the definition of ^, if all 
elements of A can define only one part of B, then the set of views of A is a subset 
of the set of views of B:T is a monotonic operator 

2. A ^ TA, i.e., each element of A is also a view of A. 

3. TA = TTA, as explained at the beginning of this paper. 

Thus, r is a closure operator, and an object A, such that A = TA is a closed object. 
The rest of the proof comes directly from Proposition |2]and the definitions. Let us verify 
that the arrow component of this endofunctor is a closure operator as well: 
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1- g implies Tf <Tg { i.e., from f < g holds that / C 5, thus Tf C Tg, i.e. 

2. / ^ T/, from 7^ f/ 

3. T/ = TTJ, in fact T/ T/ = TT/^ TTf = Jt/ 

Notice that for each arrow / it holds (by closure property of T that / w Tf, i.e., that 

J=Tj=Tf. 

It is easy to verify that DB is a 2-category with 0-cells (its objects), 1 -cells (its or- 
dinary morphisms (mappings)) and 2-cells (arrows ("inclusions") between mappings). 
The horizontal and vertical composition of 2-cells is just the composition of PO re- 
lations ^ : given ,f,g,h : A — > B with 2-cells ^/a : f^^g, : g^^h, then 
their vertical composition is .^7 = o y/a : f^^h ; given f,g:A — > B and 
h,l : B — > C, with 2-cells ^/a : g, : '^en, for a given composition 

functor • : DB{A, B) x DB{B, C) — > DB{A, C), their horizontal composition is 
y/l = \fP»Va:hof^log. 
□ 

Example 5: Equivalent morphisms: for any view-map ■ A — > TA the equivalence 
with another view-mapping qbj '■ B — > TB is obtained when they produce the same 
view. 

Let us now see that each 2-cell may be represented by an equivalent ordinary morphism 
(1-cell) (from f di g iff f dig), and moreovr, that we are able to treat the map- 
pings between mappings directly as morphisms of the DB category. 
The categorial symmetry operator T^J : MorjjB — > OhoB for any mapping (mor- 
phism) / in DB produces its "information flux" object / (i.e., the "conceptualized" 
database of this mapping). Consequently, we can define a "mapping between mappings" 
(which are 2-cells ("inclusions")) and also all higher n-cells 1271 by their direct trans- 
position into a 1-cell morphism, but we are able to make more complex morphisms 
between mappings as well. 
□ 

Example 6: Let us consider the two ordinary ( 1 -cells) morphisms m DB, f : A — > B, 
g : C — y D such that / d g- We want to show that its 1-cells correspondent 
monomorphism a : f ^ g is a result of the symmetric closure functor Tg. Let us 
prove that for two arrows, Ha = isc ° inc ° T{J{f)) and Hb = isn ° injj o cb o isb 
(where inc ■ f ^ TC is a monomorphism (well defined, because f d g implies 
f g TC), isc ■ TC — > C is an isomorphism, isb '■ B — > TB is an iso- 
morphism, cb ■ TB ^ / is an epimorphism, inn ■ f ^ TD is a monomorphism 
if g TD), isD : TD — > D is an isomorphism) holds that g o = fiB o f '■ 
we have that hA^is^f]i^f] rijlj)) = TCf]Tff]f = f, ( because TCDg2 
f and Tf ~ /), and analogously Hb = Tf = /. Thus, g o Ha = hB o f = f, and 
finally g o Ha = Iib o f ■ 

Thus, there exists the arrow {Ha; Iib) ■ J{f) — > J{g) in DB | DB. Let us prove that 
also Tl(hA \ /is) is a monomorphism as well, and that it holds that a — T^{hA', hB) '■ 



24 



/ g: in fact, by definition, 

T^ihA;hB)= U {q^}= U Uj} 

do{qj)=di{qj) = {v} & ve /TW doiqj)=diiqj) = {v} v£ f 

because /is o / = /• Thus, T^{hA; hs) = Tf = / and, consequently, T^{hA; /is) is 
a monomorphism. 

In the particular case when A = C and B = D we obtain for the 2-cells arrow -y/a : 

f^g represented by the 1-ceU arrow a = T}{idA; icIb) ■ f ^g- 

□ 



4.2 Weak observational equivalence for databases 

A database instance can also have relations with tuples containing Skolem constants as 
well (for example, the minimal Herbrand models for Global (virtual) schema of some 
Data integration system II21I28I291 '). 

In what follows we consider a recursively enumerable set of all Skolem constants as 
marked (labeled) nulls SK = {ojq, wi, ...}, disjoint from a domain set dom of all values 
for databases, and we introduce a unary predicate Val{_), such that Val{t) is true iff 
t e dom (so, Val{uji) is false for any oji e SK). 

Thus, we can define a new weak power-view operator for databases as follows: 

Definition 12. Weak power-view operator : Ob^B — > OboB is defined as fol- 
lows: for any database A in DB category it holds that: 
T^A) ^{v\ve T{A) andVi<fc<|,|V(t € Trk{v))V al{t)} 

where \v\ is the number of attributes of the view v, and tt^ is a k-th projection operator 
on relations. 

We define a partial order relation diwfor databases: 

A diu, B iff T^iA) C T^{B) 
and we define a weak observational equivalence relation for databases: 

A B iff A^w B and B A. 

The following properties hold for the weak partial order w.r.t. the partial order ^ 
(we denote 'A B' iff A ^ B and not A ~ B): 

Proposition 9 Let A and B be any two databases (objects in DB category), then: 

1. T^{A) ~ A, if A is a database without Skolem constants 
Tw{A) -< A, otherwise 

2. A ^ B implies A <^ B 

3. Ac^i B implies A B 

4. T^{T^{A)) = T{TUA)) ^ T^{T{A)) = T^{A) C T{A) 

thus, each object D = {A) is a closed object ( i.e., D ~ T{D) ) such that D 
A 

5. is a closure operator w.r.t. the "weak inclusion" relation 
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Proof: 1. From T„(yl) C T{A) (r„,(A) = T{A) only if A is without Skolem con- 
stants). 

2. If A -< B then T{A) c T{B), thus T„,(T(A)) C T^,[T{B)), i.e., A B. 

3. Directly from (4) and the fact that A ~ B A < B axvd B < A. 

4. It holds from definition of the operator T and T^: Tw{Tw{A)) = T{T^{A)) because 
T^(j4) is the set of views of A without Skolem constants and from (1). T.^{T{A)) = 
{v\ve TT{A) andVi<fc<|„|V(t e nk{v))Val{t)) = { w 1 1; G T{A) and^ ^<k<\v\^ {t e 
■Kk{v))Val{t)) T^{A), from T = TT. Let us show that T^{T^{A)) = T^{A). For 
every view v e r„,(r„,(A)), from T.^^{T,,{A)) = T{T.^^{A)) C TA holds that v€TA 
and from the fact that v is without Skolem constants it follows that v G Tw{A). The 
converse is obvious. 

5. We have that A <^ A B implies T^{A) ^„ T^{B), and T^{T^{A)) = 
T.u,{A). Thus, is a closure operator 

□ 

Notice that from point 4, the partial order " ^ " is a stronger discriminator for databases 
than the weak partial order " ", i.e., we can have two non isomorphic objects 
A ^ B that are weakly equivalent, A B (for example when A = T^{B) and B is 
a database with Skolem constants). Let us extend the notion of the type operator T into 
the notion of the endofunctor of DB category: 

Theorem 4 There exists the weak power-view endofunctor T^j = {T^,T^) ■ DB — > 
DB, such that 

1. for any object A, the object component is equal to the type operator T^. 

2. for any morphism f : A — > B, the arrow component is defined by 

TUf) = Tlif) = mc^'- o T\f) o zncA 

where incA ■ Tw{A) ^ T{A) is a monomorphism (set inclusion) and inc^g"" : 
T[B) -» Tw{B) is an epimorphism (reversed monomorphism incB)- 

3. Endofunctor preserves the properties of arrows, i.e., if a morphism f has a 
property P (monic, epic, isomorphic), then also Twif) has the same property: let 
Pmonoi Pepi ond Piso Org monomorphic, epimorphic and isomorphic properties 
respectively, then the following formula is true 

V(/ G MorDB){Pmono{f) = Pmono{T^J) A Pep,(./) = Pepr(T^f) A P^so{f) = 
Piso i^w f\ 

4. There exist the natural transformations, ^ : — > T (natural monomorphism), 
and £,~^ : T — > (natural epimorphism), such that for any object A, ^{A) = 
incA is a monomorphism and S^~^{A) = inc^™ is an epimorphism such that 

Proof: It is easy to verify that for any two arrows / : A — > B, g : B — > C, it holds 

that Tiu{g o /) C T{T^(^B) = incB o inc^^, thus T^{g ° f) = inc'^} o T^{g o /) o 
incA = iuCq^ oT^[g)oT^[f)oincA = inc'^^ oT^{g)oincB°inc'^^ oT^[f)oincA = 
Tw{g) ° T^if) - Thus, it is an endofunctor. The rest is easy to verify. 

□ 

Like the monad {T,ri,fi) and comonad {T,rf^ ^^'-^) of the endofunctor T, we can 
define such structures for the weak endofunctor as well; 
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Proposition 10 The weak power-view endofunctor Tyj = iT",T^) : DB — > DB 
defines the monad {T^ , i}w , l^w) o.nd the comonad (T^ , rj'^ , fj,^) in DB, such that 
r].w = £,~^ 'V ■ ^DB — >■ '■s natural epimorphism and — v'~^ ■ — > Id b 
is a natural monomorphisms ( '•' is a vertical composition for natural transformations), 
while fiw ■ TwTw — > Tw cmd p^ : — > T^jTw are equal to the natural identity 
transformation idr^ : T-^j — > (because T^j = T^T^)- 

Proof: It is easy to verify that all commutative diagrams of the monad and the comonad 
are diagrams composed by identity arrows. 

□ 

5 Categorial Semantics for Data Integration/Exchange 

Data exchange 1291 is a problem of taking data structured under a source schema and 
creating an instance of a target schema that reflects the source data as accurately as pos- 
sible. Data integration ||2TI instead is a problem of combining data residing at different 
sources, and providing the user with a unified global schema of this data. Thus, in this 
framework the concepts are defined in a more abstract way than in the instance database 
framework represented in the "computation" DB category. Consequently, we require 
an interpretation mapping from the scheme into the instance level, which will be given 
categorially by functors. 

5.1 Data Integration/Excliange Framework 

We formalize a data integration system I in terms of a triple {Q, S, Ai), where 

- Q = [Qt, St) is the target schema, expanded by the new unary predicate Val[^ 
such that Val{c) is true if c € dom, expressed in a language Lg over an alphabet 
Ag, where Qt is the schema and St are its integrity constraints. The alphabet 
comprises a symbol for each element of Q (i.e., relation if Q is relational, class if Q 
is object-oriented, etc.). 

- 5 is the source schema, expressed in a language L5 over an alphabet As- The 
alphabet As includes a symbol for each element of the sources. While the source 
integrity constraints may play an important role in deriving dependencies in M., 
they do not play any direct role in the data integration/exchange framework and we 
may ignore them. 

- is the mapping between Q and S, constituted by a set of assertions of the forms 

(1) qs qg, qg ^ qs 

where qs and qg are two queries of the same arity, over the source schema S and 
over the target schema Q respectively. Queries qs are expressed in a query language 
hM,s over the alphabet As, and queries qg are expressed in a query language 
i^M.g over the alphabet Ag. Intuitively, an assertion qs ^ qg specifies that the 
concept represented by the query qs over the sources corresponds to the concept 
in the target schema represented by the query qg (similarly for an assertion of type 
qg qs)- 
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- Queries gc(x), where x = .ti, .., Xfc is a non empty set of variables, over the global 
schema are conjunctive queries. We will use, for every original query (7c(x), only a 
lifted query over the global schema, denoted by q, such that q := qc{x)/\Val{xi)A 
... A Val{xk). 

In order to define the semantics of a data integration system, we start from the data at 
the sources, and specify which are the data that satisfy the global schema. A source 
database V for I = {Q,S,Ai) is constituted by one relation j-^ for each source r 
in S (sources that are not relational may be suitably presented in the relational form 
by wrapper's programs). We call global database for I, or simply database for I, any 
database for Q. A database B for X is said to be legal with respect to T) if: 

- B satisfies the integrity constraints of Q; 

- B satisfies A4 with respect to V. 

- We restrict our attention to sound views only, which are typically considered the 
most natural ones in a data integration setting II21]30I . 

In order to obtain an answer to a lifted query q from a data integration system, a tuple 
of constants is considered an answer to this query only if it is a certain answer, i.e., it 
satisfies the query in every legal global database. 

We may try to infer all the legal databases for I and compute the tuples that satisfy the 
lifted query q in all such legal databases. However, the difficulty here is that, in gen- 
eral, there is an infinite number of legal databases. Fortunately we can define another 
universal( canonical) database can (1, 2?), that has the interesting property of faithfully 
representing all legal databases. The construction of the canonical database is similar to 
the construction of the restricted chase of a database described in ISTl . 
Example 7:Let us consider the following Global-and-Local-As-View (GLAV) case 
when each dependency in A4 will be a tuple- generating dependency (tgd) of the form 

(2) Vx(3yg5(x,y) =^ 3zqe(x,z)) 

where the formula qs{x) is a conjunction of atomic formulas over S and qci^, z) is a 
conjunction of atomic formulas over Q. Moreover, each target dependency in St will 
be either a tuple-generating dependency ( tgd) of the form 

(3) Vx(3y0G(x,y) 3z(V'g(x,z)) 

(we will consider only class of weakly-full tgd for which query answering is decid- 
able, i.e., when the right-hand side has no existentially quantified variables, and if each 
2/i € y appears at most once in the left side), 
or an equality-generating dependency (egd): 

(4) Vx(0g(x) =^ (xi =X2)) 

where the formulae (paix) and 'i/jaix, y) are conjunctions of atomic formulae over Q, 
and xi , X2 are among the variables in x. 

□ 

Notice that this example includes as special cases both LAV (when each assertion is of 
the form (75 (x) — s(x), for some relation s in 5 and qg ^ qg) and GAV (when each 
assertion is of the form (7g(x, z) = g(x, z), for some relation g in Q and qg ^ qs) 
data integration mapping in which the views are sound. 
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5.2 A categorial semantics of database integrity constraints 

It is natural for a database schema {A, Sa), where ^ is a schema and Sa are the 
database integrity constraints, to take Sa to be a tuple-generating dependency (tgd) 
and equality-generating dependency (egd). These two classes of dependencies together 
comprise the embedded implication dependencies (EID) f32| which seem to include 
essentially all of the naturally-occuring constraints on relational databases. 
Let (A, Sa) be a database schema expressed in a language Ljj over an alphabet Ad, 
where A is a schema and Sa ~ ^a"^ U ^'a'^ '^^e database integrity constraints (set 
ofEIDs). 

We can represent it by a schema mapping Sa ■ A — > A, and its denotation in DB 
can be given by an arrow, as follows: 

Proposition 11 Iffor a database schema [A^ Sa) there exists a model ( instance-database) 
A that satisfies all integrity constraints Sa ~ ^*a^ U ^'a'^' t^^^ there exists an inter- 
pretation R-algebra a and its extension, a functor a* : Sch(A,SA) — ^ DB, where 
Sch{A, Sa) is the category derived from the graph (arrow) Sa '■ A — > A (composed 
by the single node A, the arrow Ea and the identity arrow id^ ■ A — > A equal to an 
empty set of integrity constraints; composition of arrows in this category corresponds 
to the union operator), such that: 

— a* (A) = A, (set of relations a{Ri) for each predicate symbol Ri in a schema A) 

— a* (idA) = idA ■ A — > A, (identity arrow in DB of the object A) 

— a*{SA) = iftgdijfegd) - A — > A, where: 

Let Rii be the set of predicate letters used in a query qa^ (x) where \\qAi is its 
obtained view, and qi G 0{Rii, r[) be mapped into a view computation a{qi) with 
a{di{qi)) = a{r[) = \\qA,{x)\\, then 

1. for each i-th tgd qAi{x) (lA2i{x,y) in S^^'^, we introduce a new pred- 
icate symbol ri with the interpretation a{ri) = \\qA2i{x,y)\\ (the view of A 
obtained from a query qA2iix,y) ), and 

ftgd ^ O Uao(„.) = {^.} & d,{v,} = {r'J ■ A 

where a{vi) is an inclusion-case tuple-mapping function /n|5] 

2. for each i-th egd qAi (x) =^ (xi = X2) in S'a'^, we introduce a new predi- 
cate symbol ri with the interpretation airi) = \\qAi {x)\\ and 

fegd = iSA^ O Uaofe,)=a(rO & d,{<!Y,) = {±} ° ■ A A 

where qy^ '■ TA — > TA is a Yes /No arrow in DB, and a{qi) : A — > TA a 
view-map arrow in DB. 

is A '■ A ~ TA is an isomorphism in DB category, and is~^^ its inverse arrow. 

Proof: It is easy to verify that if a* satisfies the conditions in points 1 and 2, then all 
constraints in Ea are satisfied, so that this functor is a Lavwere's model of a A. Notice 
that for a Yes /No arrow in DB category qy^ : TA — > TA, the di{qy,) =_L° 
means that for a view a (r,;) = |j(3'yi.(x)|| holds {xi = 2:2), i.e., the answer of the query 
qAi (x) (xi — X2) is Yes, and fegd — -L", for each egd constraint in S'^'^. 
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5.3 GLAV Categorial semantics 

Let us consider the most general case of GLAV mapping: 

Definition 13. For a general GLAV data integration/exchange system T = (S, A, Ai), 
when each tgd maps a view of one database into a view of another database, we define 
the following two schema mappings, /a ■ A — > C, fs'-S — > C, where C is a new 
logical schema composed by a new predicate symbol 7'i(x) for a formulae (7b(x, z), 
for every i-th tgd \f:x. {3y qA{'^,y) 3zqB(x, z) in A4: 

Ia = U • -.A^C} 

( Rii, i?2i: are, respectively, the set of predicate symbols used in the query QAi^, y) and 
the set of predicate letters used in the query (/^(x, z)) 

Note: in the particular cases (GAV and LAV), when a view of one database is mapped 
into one element of another database, we obtain only a mapping arrow between two 
schemas. In fact in A4 : A — > B, for GAV a schema A is the source database and B is 
the global schema; for LAV it is the opposite. 

We can generalize this framework into a complex data integration/exchange system 

l={Bk,Ak,Mk),keN. 

Let Sch{I) be the category generated by the sketch (enriched graph) I. We can now de- 
fine a mapping functor from the scheme-level category into the instance level category 
DB: 

Theorem 5 If for each {Bk, Ak, Mk), of the data integration/exchange system X — 
{Bk , Ak ,A4k)-,k g N, for a given instance A of the schema A there exists the universal 
(canonical) instance B = can{I,'D) of the global schema B legal w.r.t. A, then there 
exists the interpretation R-algebra a and its extension, the functor ( categorial Law- 
vere's model) a* : SchiT) — > DB, defined as follows: 
For every single data integration/exchange system {B, A, A4) ): 

1. for any schema arrow fs ■ B — > C in Sch{X) it holds that B = a*{B) = 
can{I, V), and C — Q*{C) is the database instance of the schema C composed by: 
for each i-th tgd Vx (3y q^i'^, y) =^ 3z qe(x, z) in M. we have an element 
Oi{ri) = 7rx(||'Zs(x, z)||) (the projection on x of the view obtained from the query 
(7b(x, z) over B = can{I, V)) in C, so that 

a{fB) ^ U {a(toj -.B^C} 

(Rii, i?2i are, respectively, the set of predicate letters used in the query g^(x, y) 
and the set of predicate letters used in the query ^^(x, z)); 
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and for any schema arrow Ja ■ — > C in SchiT), it holds: A = a* {A) is a 
given instance of the source schema A, and 

a*ifA) = U {aiv,-q,) -.A^C} 

do{qi)=Ru & di{v,) = {n} 

where a(vj) : a{di{qi)) — > a{r,) (with a{di{qi)) = 7rx(||g^(x)j|) is the pro- 
jection on X of the view obtained from the query ^^(x, y) ) is a function: 

- inclusion case, ifi-th tgd has the same direction of its implication symbol (w.r.t 
arrow Ja) 

- inverse-inclusion case, if i-th tgd has the opposite direction of its implication 
symbol 

- equal case, ifi-th tgd is an equivalence dependency relation. 

2. Let : C — > A be the equivalent reverse arrow of a*{fA) and /U*" : C — > 
B be the equivalent reverse arrow of a* (fs), then, for each system {B, A, M) ) we 
obtain the equivalent direct mapping morphisms f = f^'" o : A — > B 

and finv = /a^ o Q!*(/s) : B — > A in DB category. 

Proof: Directly from the mapping properties of DB morphisms and from the equiv- 
alent reversibility of its morphisms: each morphism in DB represents a denotational 
semantics for a well defined exchange problem between two database instances, so we 
can define a functor for such an exchange problem. Such a functor, between the schema 
integration level (theory) and the instance level (which is a model of this theory) is just 
an extended interpretation function of a particular model of R-algebra. 
□ 

Remark: A solution for a data integration/exchange system does not exist always (if 
there exists a failing finite chase, see II28I29I for more information), but if it exists then 
it is a canonical universal solution and in that case there also exists a mapping functor 
of the theorem above. So, this theorem can be abbreviated by: " given a data exchange 
problem graphi = {Bk, Ak, Mk),k G N, then: 

3a* : Sch{I) — > DB iff there exists a universal (canonical) solution for a corre- 
spondent data integration/exchange problem". 

The theorem above shows how GLAV mapping can be equivalently represented by LAV 
and GAV mappings and shows that the query answering under IC's can be done in the 
same way in LAV and GAV systems. 

5.4 Query rewriting in GAV with (foreign) key constraints 

The characteristics of the components of a data integration system in this approach ll28l 
are as follows: 

- The global schema, expanded by the new unary predicate Val{_) such that Val{c) 
is true if c S dom, is expressed in the relational model with St (key and foreign 
key constraints). We assume that in such a global schema Q there is exactly one key 
constraint for each relation. 
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1. Key constraints: given a relation r in the schema, a key constraint over r is 
expressed in the form key{r) = At, where At is a set of attributes of r. Such 
a constraint is satisfied in an instance-database A if for each ti,t2 G r^, with 
ti 7^ t2, we have ti[At] ^ t2[At], where t[At] is the projection of the tuple t 
over At. 

2. Foreign key constraints: a foreign key constraint is a statement of the form 
ri[At] C r2[Bt], where ri, r2 are relations. At is a sequence of distinct at- 
tributes of ri, and Bt is key{r2), i.e., the sequence [I, . . . ,h] constituting the 
key of r2- Such a constraint is satisfied in a database A if for each tuple ti in 

there exists a tuple t2 in such that ti[At] = i2[Bt]. 

- The mapping Ai is defined following the GAV (global-as-view) approach: to each 
relation r of the global schema Q we associate a query p{r) over the source schema 
S: we assume that this query preserves the key constraint of r. 

- For each relation r of the global schema, we may compute the relation by eval- 
uating the query p(r) over the source database V, and compute the relation Val for 
all constants in dom. The various relations so obtained define what we call the re- 
trieved global database ret{I, V). Notice that, since we assume that p{r) has been 
designed so as to resolve all key conflicts regarding r, the retrieved global database 
satisfies all key constraints in Q. 



Logical Functorial Instance 

level interpretation level 

G a*iG) = can(I,D) 

Sxt ta*(ET)=A 

Gt a* a*{Gj) = ret(IJ)) 



|a*(M)=/M 
S a*(S)=D 
in Sch(I) in DB 

Fig. 5. Functorial translation 



In our case, with integrity constraints and with sound mapping, the semantics of a data 
integration system I is specified in terms of a set of legal global instance-databases, 
namely, those databases (they exits iff I is consistent w.r.t. V, i.e., iff ret{I,'D) does 
not violate any key constraint in Q) that are supersets of the retrieved global database 
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In ||28l . given the retrieved global database ret{I,'D), we may construct inductively 
the canonical database can(1, 2?) by starting from ret{I,T>) and repeatedly applying 
the following rule: 

if {xi,...,Xh) e r'=°"(^'^)[A], {xi,...,Xh) ^ r2°"^^'^^[B], and the foreign 
key constraint ri [A] C r2 [B] is in St C Q, 
then insert in t-™"'^-^ '^' the tuple t such that 

- t[B] ^ {xi,... ,xh), and 

- for each i such that 1 < i < arity{r2), and i not in B, t[i\ ~ fr2,i{xi, . . . ,Xh)- 

Notice that the above rule does enforce the satisfaction of the foreign key constraint 
fi [A] C r2 [B] by adding a suitable tuple in r2 '■ the key of the new tuple is determined 
by the values in ri [A], and the values of the non-key attributes are formed by means of 
the Skolem function symbols /ra.i- 

Based on the results in ll28l . can{X, V) is an appropriate database for answering queries 
in a data integration system. Notice that the terms involving Skolem functions are never 
part of certain answers. Thus, the lifted queries q use the Val{S) predicate in order to 
eliminate the tuples with a Skolem values in can (I, V). 

Consequently, at the logic level, this GAV data integration system can be represented 
by the graph composed by two arrows (Figure ^ , M : S — Qt and St ■ Gt — > 
Q (Sch{T) denotes the category derived by this graph). 

Definition 14. Functorial interpretation of this logic scheme into denotational semantic 
domain DB, a* : Sch{X) — > DB, is defined by two corresponding arrows (Fig.\5} 
Jm '■ D — > ret{I,V), '■ ret{I,'D) — > can{I,V), where a*{S) = 2? is the exten- 
sion of the source database T), a*{QT) = retll^V) is the retrieved global database, 
c^*iQ) = o:*{Qt, ^t) = can{I,'D) is the universal (canonical) instance of the global 
schema with the integrity constraints, and 

Jm = U{ iDi I where di{qT)i) = {p^ir)}, do{qDi) is the set of all predicate symbols 
in the query p{r), {p{r) ^ r) G A^} 

fs = U{"("fc ■ Iretk) I doiqretk) = dii^retk) = W} , r' G ret{I,T)), where a{vk) 
is an inclusion-case tuple-mapping function (in\5}for r'}, 

because re<(1, 2?) and can{I,'D) have the same set of predicate symbols, but the ex- 
tension of each of them in ret(I, V) is a subset of the extension in can{I, V). 

Query rewriting coalgebra semantics: 

The naive computation is impractical, because it requires the building of a canonical 
database, which is generally infinite. In order to overcome this problem, a query rewrit- 
ing algorithm 1281 consists of two separate phases. 

1 . Instead of referring explicitly to the canonical database for query answering, this 
algorithm transforms the original lifted query q into a new query expg{q) over a 
global schema, called the expansion of q w.r.t. Q, such that the answer to expg{q) 
over the retrieved global database is equal to the answer to q over the canonical 
database. 

2. In order to avoid the building of the retrieved global database, the query does not 
evaluate expg{q) over the retrieved global database. Instead, this algorithm un- 
folds expg{q) to a new query, called unf j^{expg{q)), over the source relations on 
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Fig. 6. Query answering process 



the basis of Ai, and then uses the unfolded query unf j^{expg{q)) to access the 
sources. 

Figure |6]shows the basic idea of this approach (taken from |i28l|). In order to obtain the 
certain answers q^'^, the user lifted query q could in principle be evaluated (dashed 
arrow) over the (possibly infinite) canonical database can{X,'D), which is generated 
from the retrieved global database ret{I, V). In turn, ret{I, V) can be obtained from 
the source database T) by evaluating the queries of the mapping. This query answering 
process instead expands the query according to the constraints in Q, than unfolds it ac- 
cording to M., and then evaluates it on the source database. 

Let us show how the symbolic diagram in Fig.|6]can be effectively represented by com- 
mutative diagrams in DB, correspondent to the homomorphisms between T-coalgebras 
representing equivalent queries over these three instance-databases; each query in DB 
category is represented by an arrow, and can be composed with arrows that semantically 
denote mappings and integrity constraints. 

Theorem 6 LetX = {Q, S, Ai) be a data integration system , T) a source database for 
X, ret{X, V) the retrieved global database for X w.r.t. T) , and caniX, V) the universal 
(canonical) database for X w.r.t. X). 

Then, a denotational semantics for query rewriting algorithms expg{q) and unfj^ (q), 
for a query expansion and query unfolding respectively, are given by two (partial ) func- 
tions on T-coalgebras: 

expg {_) = Tf'^" o_ofs and 
unj M{expg{.)) ^ Tifs o /m)™" o . o (/^ o /m) 
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where Jm and fs are given by a functorial translation of the mapping M. and integrity 
constraints St- 

Proof: Let us denote = expg{q) and qjj = unf j^^{expg{q)) the expanded 

and successively unfolded queries of the original lifted query q. Then, by the query- 
rewriting theorem the diagrams 



TD 



qu 



D 



Tret{I,V) Tcan[I,V) 



f 



qE 



ret{I,V) 



h 



can{I, V) 



based on the composition of T-coalgebrahomomorphisms/M : [V^qu) — > {ret{I,'D), qs) 
and fx: : {ret{I,'D), qs) — > can{I,'D), commute. It is easy to verify the first two 
facts. Then, from the composition of these two functions, we obtain 

unf^{expg{.)) = unf^{.)expgi.) = TflT ° (expgi-)) o /m - TflT ° [Tf^ ° 
-ofs)o fM = (T/l^" o TfD o _ o (fs) o /m) = Tifs o fM)"'^ o _ o (/^ o /m) 

because of the duaUty and functorial property of T. 

□ 



5.5 Fixpoint operator for finite canonical solution 

The database instance can(1, 2?) can be an infinite one (see an example bellow), thus 
impossible to materialize for real applications. Thus, in this paragraph we introduce a 
new approach to the canonical model, closer to the data exchange approach 1291 . It 
is not restricted to the existence of query -rewriting algorithms, and thus can be used in 
order to define a Coherent Closed World Assumption for data integration systems also in 
the absence of query -rewriting algorithms ll33l . The construction of the canonical model 
for a global schema of the logical theory Vg for a data integration system is similar to 
the construction of the canonical database can{I, V) described in ll28l . The difference 
lies in the fact that, in the construction of this revisited canonical model, denoted by 
cauMil, 2?), for a global schema, fresh marked nidi values (set SK = {ujq,uji, ...} of 
Skolem constants) are used instead of terms involving Skolem functions, following the 
idea of construction of the restricted chase of a database described in ||31| . Thus, we 
enlarge a set of ordinary constants dom of our language by Fu = dom IJ SK. 
Another motivation for concentrating on canonical models is a view ll34l that many 
logic programs are appropriately thought of as having two components, an intensional 
database (IDB) that represents the reasoning component, and the extensional database 
(EDB) that represents a collection of facts. Over the course of time, we can "apply" the 
same IDB to many quite different EDBs. In this context it make sense to think of the 
IDB as implicitly defining a transformation from an EDB to a set of derived facts: we 
would like the set of derived facts to be the canonical model. 

Now we construct inductively the revisited canonical database model canmiF, V) over 
the domain Fu by starting from ret {I, V) and repeatedly applying the following rule; 
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if (xi, . . . , Xf,) e ^-"-n^>^)[A], (.Ti, . . . , x^) ^ rr^^^-'^IB], and the for- 
eign key constraint ri [A] C r2 [B] is in Q, 
then insert in j,^'^"^^ (^'^) the tuple < such that 

- t[B] = {xi,.. .,Xh), and 

- for each i such that I < i < arity{r2), and i not in B, t[i] = ojk, where 

is a fresh marked null value. 

Note that the above rule does enforce the satisfaction of the foreign key constraint 
1^1 [A] C r2 [B] , by adding a suitable tuple in r2 : the key of the new tuple is deter- 
mined by the values in ri[A], and the values of the non-key attributes are formed by 
means of the fresh marked values lj^ during the application of the rule above. 
The rule above defines the "immediate consequence" monotonic operator Tb defined 
by: 

Tb{I) = { ^ I ^ £ Bg, Ai A .. A Anisa ground instance of 
a rule in Ug and {Ai, .., A„} El} 
where, at the beginning / = ret{X, V), and Bg is a Herbrand base for a global schema. 
Thus, can A/ {^i 2?) is a least fixpoint of this immediate consequence operator 
Example 8: Suppose that we have two relations r and s in Q, both of arity 2 and having 
as key the first attribute, and that the following dependencies hold on Q: 

r[2]Cs[l], s[l]Cr[l]. 
Suppose that the retrieved global database stores a single tuple (a, b) in r. Then, by 
applying the above rule, we insert the tuple (6,a;i) in s; successively we add (6,a;2) 
in then (a;2, W3) in s, and so on. Observe that the two dependencies are cyclic, and 
in this case the construction of the canonical database requires an infinite sequence of 
applications of the rules. The following table represents the computation of canonical 
database: 



j,canM{X,V) 




a, b 


b,uji 


6,^2 


W2, 


UJ2,UJ4 









Thus, the canonical model cannji^, T^) is a legal database model for the global schema. 
Each certain answer of the original user query q{x), x = {xi, .■,Xk} over a global 
schema is equal to the answer g^(x)'^™"(-^-^' of the lifted query qL(x) = q{x) A 
Val{xi) A ... A Val{xk) over this canonical model. Thus, if it were possible to materi- 
alize this canonical model, the certain answers could be obtained over such a database. 
Often it is not possible because (as in the example above) this canonical model is infi- 
nite. In that case, we can use the revisited fixpoint semantics described in 1351 . based 
on the fact that, after some point, the new tuples added into a canonical model insert 
only new Skolem constants which are not useful in order to obtain certain answers (true 
in all models of a database). In fact, Skolem constants are not part of any certain answer 
to conjunctive query. Consequently, we are able to obtain a finite subset of a canonical 
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database, which is large enough to obtain all certain answers. 
Let us denote such a finite database by Cm il, V), where 

r = {(a, 6), (6, W2), (^2,^4)}, s = {(&, cji), (w2, ti-'s)} is a finite least fixpoint which 

can be used in order to obtain certain answers to lifted queries. 

□ 

In fact, we introduced marked null values (instead of Skolem functions) in order to 
define and materialize such ?l finite database: it is not a model of the data integration 
system (which is infinite), but has all necessary query-answering properties: it is able 
to give all certain answers to conjunctive queries over a global schema. Thus it can be 
materiahzed and used for query answering, instead of query -rewriting algorithms. 
The procedure for computation of a canonical database for the global schema, based 
on "immediate consequence" monotonic operator Tb defined in precedence, can be in- 
tuitively described as follows: it starts with an instance < /, > which consists of 
J, instance of the source schema, and of the empty instance for the target (global 
schema). Then we chase < /, > by applying all the dependencies in Est (a finite set 
of source-to-target dependencies) and St (a finite set of target integrity dependencies) 
as long as they are applicable. This process may fail (if an attempt to identify two do- 
main constants is made in order to define a homomorphism between two consecutive 
target instances) or it may never terminate. Let Jj and J^+i denote two consecutive tar- 
get instances of this process ( Jo = 0), then we introduce a function Ch ■ O — > 0, 
where is the set of all pairs < /, J >, / is a source instance and J one generated by 
I target instances, such that: 
< /, J,+i Ch{< /, Jj >) 3< /, J I > 
This function is monotonic. Let us define the sets 

= r„(7r2(</,J»>))-r„(J0 

and the fixpoint operator : 0^ — > 0^,, where 0^ = { Tw{tt2{S)) \ S £ 0}, 
such that tf'(T^(7r2(< /, J, >))) = T^,{n2{Ch{< I, .h >))), \.t.,WT^TT2 = T^TT2Ch : 
— > 0w, and with the least fixpoint Cm (2^, V) = S,S = ^{S). 

Proposition 12 4351/ Let < 1 ,9 > be an initial instance that consists of I, a finite in- 
stance of the source schema, and of an empty instance fbfor the target (global schema). 
Then, there exists the least fixpoint S of the function ^ : 0^ — > 0w, which is equal to 
S = T^Tr2CJ^{< I, >) for a finite n. 

Consequently, we can demonstrate the following algebraic property for the closure op- 
erator Tyj-. 

Proposition 13 The closure operator Tw is algebraic, that is, given any infinite canon- 
ical database caniT^T)), holds that 

T^{can{I,V)) = |J{r^(X') | X' can{I,V)} 

where X' can(I, V)} means that X' is a finite subset of can {I, T>). 

Proof: In fact, for X' = 7r2C^'(< /, >) for a finite n and, consequently, finite X' , 
such that X' is the least fixpoint of •f', i.e., X' = ^{X'), holds that Tyj{can(I, V)) = 
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TUX'). 
□ 

Notice that each infinite canonical database of a global database schema Q is weakly 
equivalent to its finite subset (an instance-database) C7\/(1, 2?) = X', where X' = 
\1/{X') is a finite subset of can{I, V), that is not a model of Q but is obtained as the 
least fixpoint of the operator t^/. 

Thus, can{I,'D) Cj\/(I, P), where can (I, I?) is an infinite model of t/, and 

Cj\/(I, V) is a finite weakly equivalent object to it in DB category. 

6 Conclusion 

We have presented only a fundamental overview of a new approach to the database 
concepts developed from an observational equivalence based on views. The main intu- 
itive result of obtained basic database category DB, more appropriate than the category 
Set used for categorial Lawvere's theories, is to have the possibility of making syn- 
thetic representations of database mappings, and queries over databases in a graphical 
form, such that all mapping (and query) arrows can be composed in order to obtain the 
complex database mapping diagrams. Let us consider, for example, the P2P systems or 
mappings between databases in a complex Datawarehouse. Formally, it is possible to 
develop a graphic (sketch-based) tool for a meta-mapping description of complex (and 
partial) mappings in various contexts, with a formal mathematical background. 
These, and some other, results suggest the need for further investigation of: 

- The semantics for Merging and Matching database operators based on a complete 
database lattice, as in ll36l . 

- The expressive power of the DB category with Universal Algebra considerations. 

- Monad based consideration of category DB as a computation model for view-based 
database mappings. 

- A complete investigation of all paradigms for database mappings . 

- A formalization in this context of query processing in a P2P framework 

We still have not considered other important properties of this DB category, such 
as algebraic properties for finitary representation of infinite databases, that is, locally 
finitely representation properties fST], or monoidal enrichments, based on concept of 
matching of two databases, which can be used for enriched Lawvere-s theories of 
sketches II38I39I40I in very-expressive database algebraic specification for complex 
inter-database mappings. 
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